Hashing Graphs Trees Search Trees Indexing and Multiway Trees File Organization

Introduction

The Table abstract data type

Implementations of the table data structure

Hash Tables

Bucket Array

Hash Function

Hash Code

Compression Functions

Collision-Handling Schemes

Collision likelihoods and load factors for hash tables

Hash Table Implementation

A simple Hash Table in operation

Strategies for dealing with collisions

Linear Probing

Double Hashing

Complexity of hash tables

Hash tables are powerful data structures used in computer science to efficiently store and retrieve key-value pairs. They work by using a hash function to map keys to indices in an array, allowing for constant-time access on average. In this explanation, we'll delve into the workings of a hash table with a specific hash function and a small array size.


Let's imagine we have a small data array with a size of 11, and our keys are 3-character strings composed of uppercase letters from A to Z. To map these strings to indices in our array, we'll define a hash function. Our hash function will convert each character of the string into a number (0 to 25, corresponding to their position in the alphabet) and then combine these numbers using a base-26 conversion scheme.


Here's how it works:


1. Each character in the string is converted to a number from 0 to 25 based on its position in the alphabet. For example, 'A' becomes 0, 'B' becomes 1, and so on up tob 'Z' becoming 25.

2. We treat the string as a base-26 number, where the first character's value is multiplied by 262 (262), the second character's value is multiplied by 261 (26), and the third character's value is multiplied by 260 (1). We sum these products to get a single integer value representing the entire string.

3. To ensure this integer value falls within the range of our array size (0 to 10), we take its modulo with 11 (k % 11).


Let's consider an example to illustrate this process:

Suppose we want to hash the string "ABC".
'A' maps to 0, 'B' maps to 1, and 'C' maps to 2.
Using our base-26 conversion, the integer value for "ABC" is calculated as (0 * 262) + (1 * 261) + (2 * 260) = 0 + 26 + 2 = 28.
Taking modulo 11, we get 28 % 11 = 6.


So, "ABC" would be hashed to index 6 in our array.

Now, let's proceed with inserting some keys into our hash table:

1. Inserting Keys:


We have a list of 3-letter airport codes: PHL, ORY, GCM, HKG, GLA, AKL, FRA, LAX, DCA.


2. Hashing the Keys:


We apply our hash function to each key to determine their respective indices in the array.
PHL hashes to index 4.
ORY hashes to index 8.
GCM hashes to index 6.
HKG hashes to index 4 (collision with PHL).
GLA hashes to index 8 (collision with ORY).
AKL hashes to index 7.
FRA hashes to index 5.
LAX hashes to index 1.
DCA hashes to index 1 (collision with LAX).


3. Handling Collisions:
We encounter collisions when two keys hash to the same index. To address collisions, we need a strategy for resolving them. One common approach is to use separate chaining, where each index in the array holds a linked list of key-value pairs that hashed to that index.


4. Final Hash Table:
After handling collisions, our hash table might look like this:


Index 0:
Index 1: LAX -> DCA
Index 2:
Index 3: Index 4: PHL -> HKG
Index 5: FRA
Index 6: GCM
Index 7: AKL
Index 8: ORY -> GLA
Index 9:
Index 10:


In this hash table, keys that hashed to the same index are stored in the same slot, forming a linked list structure. This allows for efficient retrieval of values even in the presence of collisions.

Hash tables are fundamental data structures used in various applications such as database indexing, caching, and symbol tables in programming languages. Understanding how they work and how to handle collisions is crucial for designing efficient algorithms and systems.

Load Factor


The load factor of a hash table is the ratio of the number of entries (n) to the size of the table (N). It indicates how full the hash table is and is calculated as λ = n/N. Keeping the load factor below certain thresholds ensures efficient operation of the hash table. For open-addressing schemes, maintaining λ < 0.5 is recommended, while for separate chaining, λ < 0.9 is advisable. When the load factor approaches 1, hash table operations can become inefficient due to increased clustering of items, leading to longer probing times and potentially linear expected running times for operations.


Hash Table


Hash tables, known as hashing, offer speedy insertions, deletions, and finds with constant average time. Unlike binary search trees, they excel at unordered operations but lack efficiency in tasks like finding minimum or maximum values and printing the entire table in sorted order.


Collision


Hash functions assign unique addresses to keys, but collisions occur when multiple keys map to the same address. This means two records end up in the same spot. Creating a perfect hash function is challenging, making collisions inevitable.